Background: Targeted resequencing has become the most used and cost-effective approach for identifying\ncausative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid\ntechnological progress, NGS laboratories are expanding their capabilities to address the increasing number of\nanalyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to\nsimultaneously execute multiple analyses, organize, and categorize the samples is still missing.\nResults: Here we describe VarGenius, a Linux based command line software able to execute customizable\npipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides\na database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic\nvariant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also\nperform the â??joint analysisâ? of hundreds of samples with a single command, drastically reducing the time for\nthe configuration and execution of the analysis.\nVarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal\nvariant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results\nthrough a web page.\nVarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this\nconfiguration, a 50M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a\njoint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1M panel in about 2 h.\nConclusions: We developed VarGenius, a â??masterâ? tool that faces the increasing demand of heterogeneous\nNGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind\nof analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the\ndatabase and any output file can be accessed programmatically. VarGenius can be used for routine analyses\nby biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to\ndevelop their own algorithms for the comparison and analysis of data.\nThe software is freely available at: https://github.com/frankMusacchia/VarGenius
Loading....